Inverse trigonometric functions are commonly used in computational geometry. Unfortunately, they are among the slowest CPU instructions, making them a major bottleneck in high-volume geometric data processing tasks. In this article we will see how to speed them up while sacrificing precision that is not necessary in many use cases.
We focus our attention on the arc-cosine function \(\arccos\) since all inverse trigonometric functions can easily be expressed in terms of each other (for how to implement \(\arcsin\) and \(\arctan\), see the code below). \(\arccos\), depicted in the first graph on the right, is defined in the range \([-1, 1]\) and is rotationally symmetrical about its midpoint, \((0, \frac{\pi}{2})\), so we need only compute it in the interval \([0, 1]\).
To approximate \(\arccos\) on \([0, 1]\), we turn to the field of numerical approximation. One of the most basic methods of approximation is the truncated Taylor series, or Taylor polynomial. As \(\arccos(x) - \pi\) is an odd function, its Taylor polynomial centered at \(x=0\) only has odd degree terms, causing it to converge very fast in an interval around \(0\). The second graph on the right shows Taylor polynomials of various degrees. The red degree-one approximation doesn't perceptibly deviate from the black graph of \(\arccos\) until \(x=0.3\); the orange third-degree approximation hugs \(\arccos\) until \(x=0.5\); etc.
However, the Taylor approximations break down near \(x=1\). \(\arccos\)'s slope approaches vertical there, yet polynomials can't have vertical slopes. Fitting a rational function to \(\arccos\) gives us the Padé approximant (third graph on the right), which is slightly tighter at the same degree but still fails for the same reason.
While polynomials and rational functions can't have a vertical slope, fractional powers of \(x\) can. In fact, note that the slope of \(\sqrt{x}\) at 0 is \(\infty\), just like \(\arccos\) at 1. The Puiseux series is a generalization of the Taylor series that allows fractional powers of \(x\). The lowest degree Puiseux approximation to \(\arccos\) shifts and scales \(\sqrt{x}\), yielding the red curve on the Puiseux approximation graph. The red curve provides a close approximation to \(\arccos\) near 1 and adding additional terms to the approximation only improves it. Indeed truncating the Puiseux series after a small number of terms approximates \(\arccos\) very well in areas where the Taylor polynomial has the most trouble.
For the application I had in mind, I chose a 6 term, 11 degree Taylor approximation and a 6 term Puiseux approximation. These provide a good approximation for \(\arccos\) around 0, getting worse for larger \(x\), and a good approximation around 1, getting worse for smaller \(x\). Putting these two approaches together gives us a good approximation in the entire interval \([0, 1]\).
The only remaining quesiton is at what point to cross over from one approximation to the other. Running a bisection algorithm yielded a crossover point of about \(0.517\), splitting the domain nearly in the middle. With these parameters, the approximation is within a factor of \(\frac{3}{100,000}\) of the correct value. The final approximation on the interval \([0, 1]\) is: $$\begin{cases} -\frac{945}{42240}x^{11} - \frac{105}{3456}x^9 - \frac{15}{336}x^7 - \frac{3}{40}x^5 - \frac{1}{6}x^3 - x + \frac{\pi}{2} & \text{ if } x < 0.5171767771244049 \\ \sqrt{2x_-}\left(\frac{63}{90112}x_-^5 + \frac{35}{18432}x_-^4 + \frac{5}{896}x_-^3 + \frac{3}{160}x_-^2 + \frac{1}{12}x_- + 1\right) & \text{ otherwise} \end{cases}$$ where \(x_- = 1-x\). Note that the Puiseux approximation only requires calculating one fractional power. We don't depict the final approximation because it is indistinguishable from \(\arccos\) at any reasonable level of zoom.
Performance of the Java implementation of \(\arccos\) below is a whopping 28 times as fast as Math.acos
and 9 as fast as Apache Commons Math's FastMath.acos
for uniformly random inputs in \((-1, 1)\). The \(\arcsin\) approximation performs similarly relative to the implementations in Math
and FastMath
. The \(\arctan\) approximation is only 2 and 1.5 times as fast as the implementations in Math
and FastMath
, respectively.
If 6 terms does not provide the right accuracy for your use case, note that each additional term cuts the error factor down by roughly \(\frac{1}{10}\) while adding just a few arithmetic operations, allowing very high accuracy while maintaining excellent performance.
Here is complete Java code implementing all three inverse trigonometric functions as well as the two-argument inverse tangent function atan2
on their full domain.